18 research outputs found

    Dataset search: a survey

    Get PDF
    Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward.Comment: 20 pages, 153 reference

    Towards More Usable Dataset Search: From Query Characterization to Snippet Generation

    Full text link
    Reusing published datasets on the Web is of great interest to researchers and developers. Their data needs may be met by submitting queries to a dataset search engine to retrieve relevant datasets. In this ongoing work towards developing a more usable dataset search engine, we characterize real data needs by annotating the semantics of 1,947 queries using a novel fine-grained scheme, to provide implications for enhancing dataset search. Based on the findings, we present a query-centered framework for dataset search, and explore the implementation of snippet generation and evaluate it with a preliminary user study.Comment: 4 pages, The 28th ACM International Conference on Information and Knowledge Management (CIKM 2019

    Improving searchability of datasets

    No full text
    Data is one of the most important digital assets in the world thanks to its business and social value. As is becoming increasingly available online, in order to use it effectively, we need tools that allow us to retrieve the most relevant datasets that match our information needs. Web search engines are not well suited for this task as they are designed for documents, not data. In recent years several bespoke search engines have been proposed to help with finding datasets, such as Google Dataset Search crawling the whole web or DataMed focused on creating an index of biomedical datasets. In this work we look closer into the problem of searching for data on the example of Open Data platforms. We first applied a mixed-methods approach aimed at deepening our understanding of users of Open Data portals and types of queries they issue while searching for datasets accompanied by analysis of search sessions over one of these data portals. Based on our findings we look into a particular problem of dataset interpretation - meaning of numerical columns. We propose a novel approach for assigning semantic labels to numerical columns. We conclude our work with the analysis of the future work needed in the field in order to potentially improve the searchability of datasets on the web

    luciekaffee/NumTab: NumTab v0.1

    No full text
    The first released version of NumTab, including the code and a version of the finished dataset

    TTLA

    No full text
    This software perform semantic labeling of numerical columns using DBpedia (or any other knowledge graph).&nbsp; </span

    NumDB-dataset

    No full text
    <div>NumDB benchmark: set of tables originally extracted from DBpedia, from which different value samples have been selected and various degrees of errors have been added in order to simulate actual tables on the Web.</div><div>The dataset has been created for <div>Kacprzak, E., Giménez-García, J. M., Piscopo, A., Koesten, L., Ibáñez, L. D., Tennison, J., & Simperl, E. (2018, November). Making Sense of Numerical Data-Semantic Labelling of Web Tables. In <i>European Knowledge Acquisition Workshop</i> (pp. 163-178). Springer, Cham.</div><div>A description of the data generation process is in the paper.<br></div></div

    Collaborative practices with structured data: do tools support what users need?

    Get PDF
    Collaborative work with data is increasingly common and spans a broad range of activities - from creating or analysing data in a team, to sharing it with others, to reusing someone else’s data in a new context. In this paper, we explore collaboration practices around structured data and how they are supported by current technology. We present the results of an interview study with twenty data practitioners, from which we derive four high-level user needs for tool support. We compare them against the capabilities of twenty systems that are commonly associated with data activities, including data publishing software, wikis, web-based collaboration tools, and online community platforms. Our findings suggest that data-centric collaborative work would benefit from: structured documentation of data and its lifecycle; advanced affordances for conversations among collaborators; better change control; and custom data access. The findings help us formalise practices around data teamwork, and build a better understanding how people’s motivations and barriers when working with structured data.</p

    Dataset search: a survey

    Get PDF
    Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta-released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems and discuss what makes dataset search a field in its own right, with unique challenges and open questions. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to tackle these questions as well as immediate next steps that will take the field forward
    corecore